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Topology of the conceptual network of language 
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We define two words in a language to be connected if they express similar concepts. The network 
of connections among the many thousands of words that make up a language is important not only 
for the study of the structure and evolution of languages, but also for cognitive science. We study 
this issue quantitatively, by mapping out the conceptual network of the English language, with 
the connections being defined by the entries in a Thesaurus dictionary. We find that this network 
presents a small-world structure, with an amazingly small average shortest path, and appears to 
exhibit an asymptotic scale-free feature with algebraic connectivity distribution. 

PACS numbers: 87.23.Ge,89.75.Hc 



Any language is composed of many thousands of words 
linked together in an apparently fairly sophisticated way. 
A language can thus be regarded as a network, in the 
following sense: (1) the words correspond to nodes of 
the network, and (2) a link exists between two words if 
they express similar concepts. Clearly, the underlying 
network of a language is necessarily sparse in the sense 
that the average number of links per node is typically 
much smaller than the total number of nodes. Identify- 
ing and understanding the common network topology of 
languages is of great importance, not only for the study 
of languages themselves, but also for cognitive science 
where one of the most fundamental issues concerns as- 
sociative memory, which is intimately related to the net- 
work topology. 

Recently, there has been a tremendous amount of in- 
terest in the study of large, sparse, and complex networks 
since the seminal papers by Watts and Strogatz ffl on the 
small- world characteristic and by Barabasi and Albert on 
scale- free features 0. The small- world concept is static 
in the sense that it describes the topological property 
of the network at a given time. Two statistical quan- 
tities characterizing a static networks are clustering C 
and shortest path L , where the former is the probabil- 
ity that any two nodes are connected to each other, given 
that they are both connected to a common node, and the 
latter measures the minimal number of links connecting 
two nodes in the network. Regular networks have high 
clusterings and small average shortest paths, with ran- 
dom networks at the opposite of the spectrum which have 
small shortest paths and low clusterings H . Small- world 
networks fall somewhere in between these two extremes. 
In particular, a network is small world if its clustering 
coefficient is almost as high as that of a regular network 
but its average shortest path is almost as small as that 



of a random network with the same parameters. Watts 
and Strogatz demonstrated that a small-world network 
can be easily constructed by adding to a regular network 
a few additional random links connecting otherwise dis- 
tant nodes. The scale-free property, on the other hand, is 
defined by an algebraic behavior in the probability dis- 
tribution P(k) of k, the number of links at a node in 
the network. This property is dynamic because it is the 
consequence of the natural evolution of the network. The 
ground-breaking work by Barabasi and Albert [0 demon- 
strates that the algebraic distribution in the connectivity 
of scale-free network is caused by two basic factors in the 
temporal evolution of the network: growth and preferen- 
tial attachment, where the former means that the number 
of nodes in the network keeps increasing and the latter 
stipulates that the probability for a new node to be con- 
nected to an existing node is proportional to the number 
of links that this node already has. The scale-free prop- 
erty appears to be universal for many networks and most 
of the scale-free networks are also small world. As of 
today, the small-world and scale-free features have been 
discovered in many networks in nature, and there has 
also been a large number of theoretical models proposed 
to explain these features ||[ [| . 

In this paper, we study the network structure of lan- 
guage ||. We present results for the English language, 
but they are expected to hold for any other languages 
because the fundamental role of the language, i.e., to 
communicate ideas, is shared by all the languages. We 
construct a conceptual network from the entries in a The- 
saurus dictionary and consider two words connected if 
they express similar concepts. The network is clearly 
evolving and sparse. We argue that this network ex- 
hibits the small- world property as a result of natural op- 
timization and, interestingly, the network is asymptoti- 
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FIG. 1: Illustration of the connections in the conceptual net- 
work for a few words. The thick line is the shortcut between 
the words "universe" and "character", which are connected 
by "nature". 



cally scale-free due to its dynamic character. We believe 
and shall argue that these findings are important not only 
for linguistics, but also for cognitive science. 

A Thesaurus dictionary gives for every entry a list of 
words that are conceptually similar to the entry word. 
For example, the list for the word "nature" includes "uni- 
verse" , "world" , and "character" . We define a network 
from this in a natural way, where each word is a node, 
and two nodes are connected if one of the corresponding 
words is listed in the entry of the other one. To build this 
network, we use an online English Thesaurus dictionary 
that is freely available 0, which has over 30,000 entries, 
and lists on average over 100 words per entry. The words 
that have an entry in the dictionary are called root words. 
Not all words in the list of a given root word are them- 
selves root words. In the construction of the network, 
only words that are root words are considered, and the 
others are dropped. The resulting network has an aver- 
age of about 60 connections per node. This number is 
much less than the total number of nodes, and thus we 
are dealing with a sparse network, where each node is 
connected to only a small fraction of the network. This 
is a necessary condition for the notion of small world to 
make sense. The construction of the network is depicted 
in Fig. |. 

We first present results concerning the small-world 
property of the network. We expect the network to be 
highly clustered, because there are many sets of related 
words that are highly interconnected. For example, "na- 
ture" is connected to "universe" , and is also connected to 
"world" , and "world" and "universe" are connected. The 
numerical calculation of C yields 0.53, which is compared 
in Table I with the corresponding value for a random net- 
work with the same parameters, in which the clustering 
approaches zero, since the probability that two nodes are 
connected is independent on whether they are connected 
to a common node or not. We see that in fact C is more 
than 250 times larger than the random network value 



TABLE I: Results for the conceptual network defined by the 
Thesaurus dictionary, and a comparison with a corresponding 
random network with the same parameters. N is the total 
number of nodes (root words), k is the average number of 
links per node, C is the clustering coefficient, and L is the 
average shortest path. 
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k 


C 


L 


Actual configuration 


30,244 


59.9 


0.53 


3.16 


Random configuration 


30,244 


59.9 


0.002 


2.5 



computed from the relation C = k/(N — 1) B. On the 
other hand, because each word is linked to only 60 oth- 
ers (on average), compared to over 30,000 in total, and 
since only words expressing similar concepts are linked, 
one might be tempted to conclude that L should be large, 
and that one might need to cross hundreds or even thou- 
sands of links to go from one word to another with a very 
different meaning. However, a calculation of L yields the 
amazingly low number of 3.2, which is very close to the 
value of about 2.5 of the corresponding random network 
estimated from the relation L rj In N/ Ink Q , as shown 
in Table I. This means that one only needs 3 steps on 
average to connect any two words in the 30,000-words 
dictionary. 

The reason why the average shortest path for the con- 
ceptual language network is so low is related to the exis- 
tence of words that correspond to two or more very dif- 
ferent concepts. For example, "nature" is connected to 
"universe" , but it is also connected to "character" . Thus, 
two words with such distinct meanings such as "universe" 
and "character" are separated by only 2 links in the net- 
work (c.f. Fig. 1). The word "nature" is thus a shortcut 
that connects regions of the network that would other- 
wise be separated by many links. The presence of such 
shortcuts is what makes L small. In fact, less than 1 per- 
cent of the words require more than 4 steps to be reached 
from any given word, on average, as shown in Table II. 
For example, one can reach any other word starting from 
"nature" with 5 steps or less. 

Our first result is thus that the conceptual network 
is highly clustered and at the same time has a very 
small length, i.e., it is a small-world network. Since the 
length L in small- world networks grows only logarithmi- 
cally with the number of nodes [Q, even if we included 
more words in the dictionary (and consequently more 
nodes), L would not change by much, and our conclu- 
sions still hold. Another important point is that even 
though we used the dictionary of a particular language 
(English) , since the Thesaurus associates words based on 
their concepts, we expect similar results to hold for other 
languages as well. In fact, in any language the network 
will be highly clustered, and any language has words that 
function as shortcuts, guaranteeing that L is very small, 
even though the particular words that act as shortcuts 
may be different for different languages. 

Next we consider the dynamical feature of the con- 
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TABLE II: Average number N n of nodes at a shortest path 
L = n from a given node in the conceptual network, p = 
N n /N is the fraction of nodes corresponding to N n . 



n 




p 


1 


59.9 


0.002 


2 


2,961 


0.098 


3 


19,762 


0.653 


4 


7,205 


0.238 


5 


222 


0.007 


6 


28.5 


0.001 


7 


4.7 


~ HT 4 


8 


0.06 


~ 10" 6 



ceptual network. The language is an evolving system, 
where new words are continually created and added to 
the network. The conceptual network of language can 
thus be regarded as a growing network. But, how are 
the new nodes attached in the conceptual network? The 
answer is encoded in the probability distribution P(k) 
of the connectivity. If new nodes are randomly added 
to the network, P(k) follows an exponential distribution 
||: P(k) ~ exp(— (3k). If new nodes are preferentially 
added to the network, e.g., if the probability IF for an 
already existing node i to acquire a link from the new 
node is proportional to fcj , the number of links that node 
i already has, then P(k) exhibits the following algebraic 
scaling [f| § : 



P(k) ~ k~ a , 



(1) 



where a = 3. The algebraic scaling law ([!]) reflects the 
fact that there is a self-organizing principle governing the 
growth of the network, which has indeed been discovered 
in many realistic networks [Q, |5|. For our conceptual 
network of language, we expect the distribution P{k) to 
reflect the intrinsically coherent manner by which a lan- 
guage is supposed to evolve. However, the rule of a per- 
fect preferential attachment LF ~ fcj appears to be too 
idealized as there are also random factors affecting how 
a new word is added to the language. We thus hypothe- 
size that for the conceptual network of language, a new 
node is added to the network with both preferential and 
random attachments. Specifically, we assume, 



IF 



(1 -p)ki +p, 



(2) 



where p and (1 —p) are the weights of random and prefer- 
ential attachments, respectively. A recent work M indi- 
cates that the attachment rule (|2|) leads to the following 
connectivity distribution: 



P(k) ~ (k + 



1-p 



7 = 3- 



m(l — p) ' 



(3) 



where m is the number of new links added to the network 
at each time step. We see that for small k, P(k) exhibits 
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FIG. 2: Algebraic scaling behavior of P(k) for the conceptual 
network of the English language. The inset shows the initially 
exponential decay of P(k). 



an approximately exponential behavior, while for large fc, 
P(k) appears to be algebraic with an exponent greater 
than 3. We then expect to observe a crossover from the 
exponential to algebraic behavior as k is increased. This 
indeed appears to be the case for the conceptual network 
of language, as shown in Fig. ||, where the asymptotic al- 
gebraic scaling exponent is about 3.5, which is consistent 
with the theoretical prediction in Eq. (|^). This indi- 
cates that our hypothesis of mixed contributions from 
preferential and random attachments in the development 
of the conceptual network of language is plausible, and 
there is indeed a self-organized structure in the network 
to certain degree. 

A heuristic justification for our hypothesis (||) is as fol- 
lows. Because of the small-world topology, each node of 
the conceptual network on average has a large fraction 
of local connections and a small fraction of long range 
connections. When a new node is added to the network, 
it has the same probability of attaching to any one of 
the already existing nodes. But, once it attaches a node 
j it has the tendency to connect preferentially to the 
nodes that are already connected to j JTo| . Preferential 
attachment comes from the second step, since the prob- 
ability that a node i is in the neighborhood of node j is 
proportional to the number of links ki of node «; while 
the random component comes from the random choice 
of the first connection j and the subsequent long range 
connections. The small-world property is consistent with 
the evolutionary character of the network, as the growing 
process tends to keep high clustering and small shortest 
path. 

In comparison with the small-world model originally 
proposed in Ref. |l]], a scale- free network presents a 
highly heterogeneous distribution of links per node. In 
spite of this, the evolution of the conceptual network is 
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demonstrated to be robust, in that most of the words 
correspond to nodes connected to few other nodes, and 
can be removed without affecting the structure of the 
network || . There are also words that are the most 
visible ones, but they are unlikely to be suddenly lost or 
undergo an abrupt transformation in the evolution with- 
out a self-organized reconnection of the neighbors |Q . 

We conclude with some thoughts on the meaning of 
our results for cognitive science. It is well known that 
human memory is associative, which means that infor- 
mation is retrieved by connecting similar concepts, just 
as in our network above jll| From the standpoint 
of retrieval of information in an associative memory, the 
small-world property of the network represents a maxi- 
mization of efficiency: on the one hand, similar pieces of 



information are stored together, due to the high cluster- 
ing, which makes searching by association possible; on 
the other hand, even very different pieces of information 
are never separated by more than a few links, or associa- 
tions, which guarantees a fast search. We thus speculate 
that associative memory has arisen partly because of a 
maximization of efficiency in the retrieval by natural se- 
lection. This issue may be related to the fact that the 
neural network is probably a small-world network as well 
p5|, [lgl , which is probably necessary for the brain to be 
able to hold a conceptual network that is needed for as- 
sociative memory. 
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